166 research outputs found
Are Microcontrollers Ready for Deep Learning-Based Human Activity Recognition?
The last decade has seen exponential growth in the field of deep learning with deep learning on microcontrollers a new frontier for this research area. This paper presents a case study about machine learning on microcontrollers, with a focus on human activity recognition using accelerometer data. We build machine learning classifiers suitable for execution on modern microcontrollers and evaluate their performance. Specifically, we compare Random Forests (RF), a classical machine learning technique, with Convolutional Neural Networks (CNN), in terms of classification accuracy and inference speed. The results show that RF classifiers achieve similar levels of classification accuracy while being several times faster than a small custom CNN model designed for the task. The RF and the custom CNN are also several orders of magnitude faster than state-of-the-art deep learning models. On the one hand, these findings confirm the feasibility of using deep learning on modern microcontrollers. On the other hand, they cast doubt on whether deep learning is the best approach for this application, especially if high inference speed and, thus, low energy consumption is the key objective
MuMiN:A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset
Misinformation is becoming increasingly prevalent on social media and in news
articles. It has become so widespread that we require algorithmic assistance
utilising machine learning to detect such content. Training these machine
learning models require datasets of sufficient scale, diversity and quality.
However, datasets in the field of automatic misinformation detection are
predominantly monolingual, include a limited amount of modalities and are not
of sufficient scale and quality. Addressing this, we develop a data collection
and linking system (MuMiN-trawl), to build a public misinformation graph
dataset (MuMiN), containing rich social media data (tweets, replies, users,
images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand
Twitter threads, each of which have been semantically linked to 13 thousand
fact-checked claims across dozens of topics, events and domains, in 41
different languages, spanning more than a decade. The dataset is made available
as a heterogeneous graph via a Python package (mumin). We provide baseline
results for two node classification tasks related to the veracity of a claim
involving social media, and demonstrate that these are challenging tasks, with
the highest macro-average F1-score being 62.55% and 61.45% for the two tasks,
respectively. The MuMiN ecosystem is available at
https://mumin-dataset.github.io/, including the data, documentation, tutorials
and leaderboards.Comment: 9+3 page
The Safety Challenges of Deep Learning in Real-World Type 1 Diabetes Management
Blood glucose simulation allows the effectiveness of type 1 diabetes (T1D)
management strategies to be evaluated without patient harm. Deep learning
algorithms provide a promising avenue for extending simulator capabilities;
however, these algorithms are limited in that they do not necessarily learn
physiologically correct glucose dynamics and can learn incorrect and
potentially dangerous relationships from confounders in training data. This is
likely to be more important in real-world scenarios, as data is not collected
under strict research protocol. This work explores the implications of using
deep learning algorithms trained on real-world data to model glucose dynamics.
Free-living data was processed from the OpenAPS Data Commons and supplemented
with patient-reported tags of challenging diabetes events, constituting one of
the most detailed real-world T1D datasets. This dataset was used to train and
evaluate state-of-the-art glucose simulators, comparing their prediction error
across safety critical scenarios and assessing the physiological
appropriateness of the learned dynamics using Shapley Additive Explanations
(SHAP). While deep learning prediction accuracy surpassed the widely-used
mathematical simulator approach, the model deteriorated in safety critical
scenarios and struggled to leverage self-reported meal and exercise
information. SHAP value analysis also indicated the model had fundamentally
confused the roles of insulin and carbohydrates, which is one of the most basic
T1D management principles. This work highlights the importance of considering
physiological appropriateness when using deep learning to model real-world
systems in T1D and healthcare more broadly, and provides recommendations for
building models that are robust to real-world data constraints.Comment: 15 pages, 3 figure
Online Feature Selection for Activity Recognition using Reinforcement Learning with Multiple Feedback
Recent advances in both machine learning and Internet-of-Things have
attracted attention to automatic Activity Recognition, where users wear a
device with sensors and their outputs are mapped to a predefined set of
activities. However, few studies have considered the balance between wearable
power consumption and activity recognition accuracy. This is particularly
important when part of the computational load happens on the wearable device.
In this paper, we present a new methodology to perform feature selection on the
device based on Reinforcement Learning (RL) to find the optimum balance between
power consumption and accuracy. To accelerate the learning speed, we extend the
RL algorithm to address multiple sources of feedback, and use them to tailor
the policy in conjunction with estimating the feedback accuracy. We evaluated
our system on the SPHERE challenge dataset, a publicly available research
dataset. The results show that our proposed method achieves a good trade-off
between wearable power consumption and activity recognition accuracy
Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda
Machine learning (ML) enabled classification models are becoming increasingly
popular for tackling the sheer volume and speed of online misinformation and
other content that could be identified as harmful. In building these models,
data scientists need to take a stance on the legitimacy, authoritativeness and
objectivity of the sources of ``truth" used for model training and testing.
This has political, ethical and epistemic implications which are rarely
addressed in technical papers. Despite (and due to) their reported high
accuracy and performance, ML-driven moderation systems have the potential to
shape online public debate and create downstream negative impacts such as undue
censorship and the reinforcing of false beliefs. Using collaborative
ethnography and theoretical insights from social studies of science and
expertise, we offer a critical analysis of the process of building ML models
for (mis)information classification: we identify a series of algorithmic
contingencies--key moments during model development that could lead to
different future outcomes, uncertainty and harmful effects as these tools are
deployed by social media platforms. We conclude by offering a tentative path
toward reflexive and responsible development of ML tools for moderating
misinformation and other harmful content online.Comment: Andr\'es Dom\'inguez Hern\'andez, Richard Owen, Dan Saattrup Nielsen
and Ryan McConville. 2023. Addressing contingency in algorithmic
(mis)information classification: Toward a responsible machine learning
agenda. Accepted in 2023 ACM Conference on Fairness, Accountability, and
Transparency (FAccT '23), June 12-15, 2023, Chicago, United States of
America. ACM, New York, NY, USA, 16 page
- …